Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

نویسندگان

  • Anjie Fang
  • Craig MacDonald
  • Iadh Ounis
  • Philip Habel
چکیده

Twitter offers scholars new ways to understand the dynamics of public opinion and social discussions. However, in order to understand such discussions, it is necessary to identify coherent topics that have been discussed in the tweets. To assess the coherence of topics, several automatic topic coherence metrics have been designed for classical document corpora. However, it is unclear how suitable these metrics are for topic models generated from Twitter datasets. In this paper, we use crowdsourcing to obtain pairwise user preferences of topical coherences and to determine how closely each of the metrics align with human preferences. Moreover, we propose two new automatic coherence metrics that use Twitter as a separate background dataset to measure the coherence of topics. We show that our proposed Pointwise Mutual Information-based metric provides the highest levels of agreement with human preferences of topic coherence over two Twitter datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Incorporating Tweet Relationships into Topic Derivation

With its rapid users growth, Twitter has become an essential source of information about what events are happening in the world. It is critical to have the ability to derive the topics from Twitter messages (tweets), that is, to determine and characterize the main topics of the Twitter messages (tweets). However, tweets are very short in nature and therefore the frequency of term co-occurrences...

متن کامل

Temporal Classification and Visualization of Topics in a Twitter Search Interface

Searching within Twitter is a challenging task; the short and cryptic nature of tweets leads to search results sets that may include information on many different topics. While many topic modelling approaches exist to extract the salient topics from the tweets, what is missing is a method for temporally classifying the topics and showing these to a searcher to help them understand the makeup of...

متن کامل

A Model for Mining Public Health Topics from Twitter

We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ailments, such as influenza, infections, obesity, as compared to standard topic models. Furthermore, ATAM mat...

متن کامل

Detection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets

Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016